Vtln-based Rapid Cross-lingual Adaptation for Statistical Parametric Speech Synthesis

نویسندگان

  • Lakshmi Saheer
  • Hui Liang
  • John Dines
  • Philip N. Garner
چکیده

Cross-lingual speaker adaptation (CLSA) has emerged as a new challenge in statistical parametric speech synthesis, with specific application to speech-to-speech translation. Recent research has shown that reasonable speaker similarity can be achieved in CLSA using maximum likelihood linear transformation of model parameters, but this method also has weaknesses due to the inherent mismatch caused by differing phonetic inventories of languages. In this paper, we propose that fast and effective CLSA can be made using vocal tract length normalization (VTLN), where strong constraints of the vocal tract warping function may actually help to avoid the most severe effects of the aforementioned mismatch. VTLN has a single parameter that warps spectrum. Using shifted or adapted pitch, VTLN can still achieve reasonable speaker similarity. We present our approach, VTLN-based CLSA, and evaluation results that support our proposal under the limitation that the voice identity and speaking style of a target speaker don’t diverge too far from that of the average voice model.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Combining Vocal Tract Length Normalization with Linear Transformations in a Bayesian Framework

Recent research has demonstrated the effectiveness of vocal tract length normalization (VTLN) as a rapid adaptation technique for statistical parametric speech synthesis. VTLN produces speech with naturalness preferable to that of MLLRbased adaptation techniques, being much closer in quality to that generated by the original average voice model. By contrast, with just a single parameter, VTLN c...

متن کامل

Framework Of Feature Based Adaptation For Statistical Speech Synthesis And Recognition

The advent of statistical parametric speech synthesis has paved new ways to a unified framework for hidden Markov model (HMM) based text to speech synthesis (TTS) and automatic speech recognition (ASR). The techniques and advancements made in the field of ASR can now be adopted in the domain of synthesis. Speaker adaptation is a well-advanced topic in the area of ASR, where the adaptation data ...

متن کامل

Bias Adaptation for Vocal Tract Length Normalization

Vocal tract length normalisation (VTLN) is a well known rapid adaptation technique. VTLN as a linear transformation in the cepstral domain results in the scaling and translation factors. The warping factor represents the spectral scaling parameter. While, the translation factor represented by bias term captures more speaker characteristics especially in a rapid adaptation framework without havi...

متن کامل

Personalising speech-to-speech translation: Unsupervised cross-lingual speaker adaptation for HMM-based speech synthesis

In this paper we present results of unsupervised cross-lingual speaker adaptation applied to text-to-speech synthesis. The application of our research is the personalisation of speech-to-speech translation in which we employ a HMM statistical framework for both speech recognition and synthesis. This framework provides a logical mechanism to adapt synthesised speech output to the voice of the us...

متن کامل

State mapping based method for cross-lingual speaker adaptation in HMM-based speech synthesis

A phone mapping-based method had been introduced for cross-lingual speaker adaptation in HMM-based speech synthesis. In this paper, we continue to propose a state mapping based method for cross-lingual speaker adaptation, where the state mapping between voice models in source and target languages is established under minimum Kullback-Leibler divergence (KLD) criterion. We introduce two approach...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012